Information-preserving hybrid data reduction based on fuzzy-rough techniques

نویسندگان

  • Qinghua Hu
  • Daren Yu
  • Zongxia Xie
چکیده

Data reduction plays an important role in machine learning and pattern recognition with a high-dimensional data. In real-world applications data usually exists with hybrid formats, and a unified data reducing technique for hybrid data is desirable. In this paper, an information measure is proposed to computing discernibility power of a crisp equivalence relation or a fuzzy one, which is the key concept in classical rough set model and fuzzy-rough set model. Based on the information measure, a general definition of significance of nominal, numeric and fuzzy attributes is presented. We redefine the independence of hybrid attribute subset, reduct, and relative reduct. Then two greedy reduction algorithms for unsupervised and supervised data dimensionality reduction based on the proposed information measure are constructed. Experiments show the reducts found by the proposed algorithms get a better performance compared with classical rough set approaches. 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid Attribute Reduction for Classification Based on A Fuzzy Rough Set Technique

Data usually exists with hybrid formats in real-world applications, and a unified data reduction for hybrid data is desirable. In this paper a unified information measure is proposed to computing discernibility power of a crisp equivalence relation and a fuzzy one, which is the key concept in classical rough set model and fuzzy rough set model. Based on the information measure, a general defini...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation

Feature subset selection has become an important challenge in areas of pattern recognition, machine learning and data mining. As different semantics are hidden in numerical and categorical features, there are two strategies for selecting hybrid attributes: discretizing numerical variables or numericalize categorical features. In this paper, we introduce a simple and efficient hybrid attribute r...

متن کامل

A fuzzy rough set approach for incremental feature selection on hybrid information systems

In real-applications, there may exist many kinds of data (e.g., boolean, categorical, real-valued and set-valued data) and missing data in an information system which is called as a Hybrid Information System (HIS). A new Hybrid Distance (HD) in HIS is developed based on the value difference metric, and a novel fuzzy rough set is constructed by combining the HD distance and the Gaussian kernel. ...

متن کامل

Rough and Fuzzy Sets for Dimensionality Reduction

One of the main obstacles facing current machine learning techniques is that of dataset dimensionality. Usually, a redundancy-removing step is carried out beforehand to enable these techniques to be effective. Rough Set Theory (RST) has been used as such a dataset pre-processor with much success, however it is reliant upon a discretized dataset; important information may be lost as a result of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2006